A Note on the Lasso and Related Procedures in Model Selection

نویسندگان

  • Chenlei Leng
  • Yi Lin
  • Grace Wahba
چکیده

The Lasso, the Forward Stagewise regression and the Lars are closely related procedures recently proposed for linear regression problems. Each of them can produce sparse models and can be used both for estimation and variable selection. In practical implementations these algorithms are typically tuned to achieve optimal prediction accuracy. We show that, when the prediction accuracy is used as the criterion to choose the tuning parameter, in general these procedures are not consistent in terms of variable selection. That is, the sets of variables selected are not consistent at finding the true set of important variables. In particular, we show that for any sample size n, when there are superfluous variables in the linear regression model and the design matrix is orthogonal, the probability of the procedures correctly identifying the true set of important variables is less than a constant (smaller than one) not depending on n. This result is also shown to hold for two dimensional problems with general correlated design matrices. The results indicate that in problems where the main goal is variable selection, prediction accuracy based criteria alone are not sufficient for this purpose. Adjustments will be discussed to make the Lasso and related procedures useful/consistent for variable selection. Keyword: consistent model selection, Forward Stagewise regression, Lars, Lasso, variable selection

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Differenced-Based Double Shrinking in Partial Linear Models

Partial linear model is very flexible when the relation between the covariates and responses, either parametric and nonparametric. However, estimation of the regression coefficients is challenging since one must also estimate the nonparametric component simultaneously. As a remedy, the differencing approach, to eliminate the nonparametric component and estimate the regression coefficients, can ...

متن کامل

A Note on the Lasso for Gaussian Graphical Model Selection

Inspired by the success of the Lasso for regression analysis (Tibshirani, 1996), it seems attractive to estimate the graph of a multivariate normal distribution by `1-norm penalised likelihood maximisation. The objective function is convex and the graph estimator can thus be computed efficiently, even for very large graphs. However, we show in this note that the resulting estimator is not consi...

متن کامل

Performance of Tunisian Public Hospitals: A Comparative Assessment Using the Pabón Lasso Model

Background and Objectives: Constant monitoring of healthcare organizations’ performance is an integral part of informed health policy-making. Several hospital performance assessment methods have been proposed in the literature. Pabon Lasso Model offers a fast and convenient method for comparative evaluation of hospital performance. This study aimed to evaluate the relative performance of hospit...

متن کامل

Association of lifestyle with metabolic syndrome and non-Alcoholic fatty liver in children and adolescence

Introduction: Identification of the factors related to non-alcoholic fatty liver disease in children and adolescents help us to know appropriate methods for prevention and control of chronic diseases. Methods: This cross-sectional and analytic study comprised 962 children and adolescents, aged 6-18 years, in Isfahan in 2008. Variables related to life style and metabolic syndromes related...

متن کامل

A Comprehensive Model for R and D Project Portfolio Selection with Zero-One Linear Goal-Programming (RESEARCH NOTE)

Technology centered organizations must be able to identify promising new products or process improvements at an early stage so that the necessary resources can be allocated to those activities. It is essential to invest in targeted research and development (R and D) projects as opposed to a wide range of ideas so that resources can be focused on successful outcomes. The selection of the most ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004